Automate Data Pipeline for Public Notice Aggregator

freelancer.com 🟠 2026-05-11

🔹 Automate Data Pipeline for Public Notice Aggregator
👤 Client: 🇮🇳 Chattrapati Sambhaji Nagar, India Member since 2025-11-29
💰 Price: $92 Average bid
🚩 Problem: Need to extract public notices from e-papers, store them systematically, and push data to Firebase Firestore.
📦 Existing: 500+ E-paper URLs, Google Drive API, Google Cloud Vision API, Firebase Firestore, Google Cloud Functions

Specifications:

[Target] Extract daily 'Public Notice' images/sections from 500+ e-papers based on provided list.
[Method] Use Python with web scraping libraries (Selenium, Scrapy) for automated scraping; use Google Cloud Vision API for OCR in Marathi and Hindi languages.
[UI/UX] Not applicable
[Stack] Python, Selenium, Scrapy, Google Drive API, Google Cloud Vision API, Firebase Firestore, Google Cloud Functions
[Security] Ensure data privacy and security during transmission and storage. Use secure authentication methods with Firebase.
[Format] Store images in organized folders (Date/District wise) on Google Drive; push formatted text to Firebase Firestore.

Workflow:

1. Schedule daily scraping tasks using a cron job or task scheduler.
2. Use Python and web scraping libraries to extract 'Public Notice' sections from e-papers based on the provided URL list.
3. Upload extracted images to Google Drive via API, organizing them by date and district.
4. Apply OCR using Google Cloud Vision API for Marathi and Hindi languages.
5. Clean and format text output using Gemini API if available.
6. Push final formatted text with source line to Firebase Firestore database.
7. Deploy script on Google Cloud Functions to optimize costs.

⚡ Receive notifications instantly Join our community.

Discord Telegram

Our Social Networks

LinkedIn Twitter Facebook

🕷️️ Job Radar • SCRAPING